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The molecular mechanisms underlying viral pathogenesis are yet poorly understood owed to the large 
number of factors involved and the complexity of their interactions. Could we identify a minimal set of host 
transcription factors (TF) whose misregulation would result in the transcriptional profile characteristic of 
infected cells in absence of the virus? How many of such sets exist? Are all orthogonal or share critical TFs 
involved in specific biological functions? We have developed a computational methodology that uses a 
quantitative model of the transcriptional regulatory network (TRN) of Arabidopsis thaliana to explore the 
landscape of all possible re-engineered TRNs whose transcriptomic profiles mimic those observed in 
infected plants. We found core sets containing between six and 34 TFs, depending on the virus, whose in 
silico knockout or overexpression in the TRN resulted in transcriptional profiles that minimally deviate 
from those observed in infected plants. 

For over decades, plant molecular virology has been overly focused on the pathogen itself, studying their 
individual genes and products, and their local effects on certain regulatory pathways related to antiviral 
responses or to symptoms development. The symptoms of viral infections result from the sum of direct 
effects (e.g., cytoplasmic accumulation of viral products or modifications in the cytoskeleton or in membrane 
structures) as well as of indirect effects from the alteration of host physiology, notably by diverting almost every 
cellular resource to the production of virus-specific components, and by actively suppressing host defenses 1,2 . The 
arrival of genomic tools have allowed high-throughput genetic and metabolic screenings, providing unpreced- 
ented views of the plant host-virus interactions from a systemic perspective that would allow for a deeper 
understanding on how host and virus genotypes interplay in determining the pathological outcome of an 
infection 37 . 

Microarray-based functional genomics, which provides a global view of transcriptional changes in host cells, 
has been the most commonly used method to study global changes during plant-virus interactions 2,816 . As a 
response to infection, hosts compensate by over- or under-expressing certain cellular pathways, and deploying 
specific antiviral measures. Collectively, these alterations determine the type and strength of symptoms displayed 
by infected organisms as well as the virulence of the infection. Imposing the measured transcriptional changes in a 
biological network context, it was confirmed that host cells undergo a significant reprogramming of their 
transcriptome during infection 17,18 , which is possibly a central requirement for the mounting of host defenses. 
Moreover, Rodrigo et at uncovered a general mode of plant virus action in which perturbations preferentially 
affect genes that are highly connected, central and organized in modules 19 , a mechanism of action that has been 
pervasively described for animal viruses 20 26 . 

Inspired by an integrated computational-experimental approach for discovering genes and pathways that are 
targets of specific compounds 27 , herein, we aimed to computationally re-design the transcriptional regulatory 
network (TRN) of Arabidopsis thaliana by altering key transcription factors (TFs) in order to mimic the tran- 
scriptional response observed upon infecting the plant with several different virus. We will accomplish this goal 
by re-designing optimal genetic network using as starting point a genome-scale TRN model of the plant 28 . Hence, 
those computational re-designs will provide new insights about mechanisms related with virus-target interactions 
in the plant. Recently, many groups have proposed and implemented different approaches for genome-wide 
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re-design, by knocking out and over-expressing genes, of prokaryotes 
and eukaryotes to control global gene expression 29 32 . Following this 
synthetic biology strategy, herein we have computationally re- 
designed A. thaliana TRN by exhaustively exploring multiple gene 
perturbations in the form of gene knockouts or over-expressions. 
Hence, we have corroborated in silico that several genetic modifica- 
tions imposed on a critical set of TFs generates a high diversity in the 
transcriptome of the plant. 

Could a reduced set of perturbed TFs mimic the plant's transcrip- 
tional response to viral infections? It is of outmost importance to 
harness the ability of using computational design to predict and 
optimize a la carte synthetic genomes with desired transcriptional 
responses (Figure 1). To address this question, we have developed an 
algorithm that uses as starting point a wild-type transcription regu- 
lation model, inferred from high-throughput microarray data 28 . This 
TRN is evolved using a heuristic optimization method that at each 
stage computes the updated gene expression profile and compares it 
with the one observed during viral infection. With this approach, we 
explored the space of re-engineered TRNs to find the optimal global 
network whose predicted transcriptional profile has a minimal dis- 
tance to the one characteristic of viral infections. Consequently, the 
use of genomic techniques to develop design-guided models, and the 
application of reverse-engineering methods, open the doors for 
delineating a high-resolution picture of host-pathogen interactions. 

Results 

We have developed a methodology to automatically re-design the 
TRN of A. thaliana to mimic the transcriptomic changes induced by 
perturbations. In particular, we have focused on the perturbations 
induced by the infection with a set of eight different viruses. For that, 
we hypothesized that symptoms of viral infections could be recreated 



in absence of the pathogenic agent by altering a minimal core set of 
TFs (Figure IB). We used a genome-wide model of A. thaliana gene 
transcription based on ordinary differential equations (ODEs) to 
predict changes in gene expression after modifications such as TF 
knockout or overexpression 28 . This model contains 21929 genes 
(—82% of all non-redundant genes in A. thaliana), 1187 of which 
are putative TFs (—75% of all A. thaliana TFs) and consequently, are 
the potential candidates to be perturbed. Figure 2 shows a schematic 
representation of the evolutionary algorithm implemented, in which 
single gene mutations were made and then selected according to the 
fitness functions discussed in the Methods section. This algorithm 
explores the landscape of all possible TRN produced by over-expres- 
sing or knocking out A. thaliana TFs. Operationally, these perturba- 
tions were introduced by modifying the corresponding ODEs. At 
each evolutionary step, a population of such perturbed TRNs was 
generated and their corresponding expression profiles computed and 
compared with the target transcriptomic profile observed for each 
viral infection. Those TRNs showing the better matching with the 
target were selected for the next round of optimization, as thoroughly 
discussed in the Methods section. Fifty independent runs of this 
evolutionary optimization were generated to evaluate the conver- 
gence of the results. 

Re-engineered TRNs that mimic the transcriptomic response 
characteristic of different viral infections. Figure 3 illustrates that 
re-engineered TRNs actually show expression profiles that are 
significantly closer to those observed in infected plants than the 
profile inferred for non-infected plants (wild-type). Different 
panels correspond to the optimizations done for the eight different 
viruses using the scoring method that takes into consideration only 
perturbations in TFs (Figure 2). The ordinates axis shows the 
computed expression score (S exp ). A perfect matching between the 
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Figure 1 | Schematic representation of the methodology followed for this study. (A) Reverse-engineering to reveal gene sub-networks differentially 
altered by viral infection. (B) Reprogramming cells to mimic the plant transcriptomic responses observed upon viral infection by using computational 
genome redesign. 
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Figure 2 | Computational methodology. Schematic representation of the algorithm used to predict optimal redesigns of the A. thaliana TRN that mimics 
the alterations induced by viral infections on the plant transcriptome. 



observed TRN and that computed for the re-engineered genome 
would have S ex „ = 0. The larger the departure from zero, the 
poorer the matching between observed and predicted expression 
profiles. The black vertical line corresponds to the score computed 
for the wild-type TRN (i.e., non-infected plants). In blue, we show the 
observed distribution of scores after multiple optimization runs. In 
all cases, the optimization results in artificial TRNs that are closer to 
the observed transcriptome than to the wild-type. The best fit was 
obtained for TRV (S exp < 1.96%; P < 0.001) and the worst for PPV 
(S exp < 9.51%; P < 0.001). As an additional quality control, we also 
ran the optimization algorithm but without the selective constraint 
imposed by matching the observed transcriptional profiles (red bars 
in Figure 3). In these cases, the distributions of S exp did not show any 
improvement but, instead, had average values larger than those 
observed for the wild-type TRN. As we would expect, considering 
not only TFs in the scoring function but all genes, predictability is not 
significantly improved, ranging between S exp < 1.88% (P < 0.001) 
for CaLCuV and S exp < 10.36% (P < 0.001) for PPV (Figure SI). 
Indeed, the S exp obtained only with TFs or with all genes are highly 
correlated (Figure S2; r = 0.714, 6 d.f., P = 0.046), thus suggesting 



that the conclusion is robust to the choice of genes to be perturbed 
during the optimization process. Interestingly, this conclusion does 
not specifically hold for infection with TCV (Figure S2). For this 
virus, using only TFs results in a poor optimization. To better 
illustrate the matching between the gene expression of the re- 
engineered TRNs and those of the corresponding virus-infected 
plants, Figure 4 shows the TFs expression values for the different 
predicted TRNs. Hence, most of re-designed TRNs guarantee TFs 
expression patterns highly similar to those measured in the virus- 
infected plants, and quite different from those observed in the non- 
infected plants. 

A minimal set of transcriptional factors guarantees TRN re- 
designs that mimic viral infections. Transcriptomic studies have 
shown that the number of A. thaliana TFs altered upon viral 
infection varies among different viruses, with TRV altering 1 1 and 
TEV-Arl7 altering 101 (gray bars in the upper panel of Figure 5A) 4 ' 19 . 
What would be the number of TFs whose expression has been altered 
in the re-engineered TRNs? To answer this question we simply 
counted the number of the transcriptional modification done on 
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Figure 3 | Distributions of expression scores, S exp . Histogram of the scores obtained in 50 optimization processes (blue bars) forced to mimic the plant 
transcriptome changes observed after infection with eight different viruses. In the optimization process, only TFs were considered in the scoring function 
(S). Random simulations were computed without imposing any selective pressure (red bars). Black line shows the score obtained using the transcriptional 
model inferred of A. thaliana. 
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Figure 4 | Heat maps comparing the expression profiles of TFs. Expression profiles of the TFs measured in the infected plants, and predicted in the best- 
optimized and random transcriptomes. Note that under- and over-expressed genes are plotted in red and green colors, respectively. 



each designed TRN. Figure 5A shows this information for each virus, 
both for TRNs evolved with the scoring function that only accounts 
for changes in TFs (blue bars) and for the scoring function that 
accounts for alterations in all genes (red bars). Surprisingly, after 
optimization, the number of altered TFs necessary to mimic viral 
infections was quite reduced for all viruses. For example, looking 
only at TRNs designed using the first scoring function, the 
minimum number of perturbations necessary was found for 
CaLCuV infection (6.40) and the maximum for PPV (34.24) 
(Figure 5A). Narrower ranges (from 1.92 to 9.50) were obtained 
for all viruses when the second scoring function was used instead. 

The set of TFs proposed by the design algorithm do not necessarily 
include all those whose expression has been observed in real infec- 
tions (Figure 5A, lower panel). In general, a bootstrap test shows that 
the intersections between the lists of proposed TFs and observed 
altered TFs were not significantly enriched except in two instances. 
In the case of TEV-AU7 40.15% of the proposed TFs were altered in 
the real infection, being the proposed set significantly enriched in 
observed TFs (Figure 5A lower panel; bootstrap test, P < 0.001). This 



significant enrichment does not exist when all genes were considered 
in the scoring function. Analogously, 18.73% of the proposed TFs 
perturbed in the redesigned TRNs were contained in the list of 
observed altered TFs for TRV (Figure 5A, lower panel; bootstrap 
test, P < 0.001), and this result remains significant independently 
of the gene set weighted by the scoring function. 

Next, we observed that the lists of genes proposed by all re-designs 
were significantly enriched in biological functions related with res- 
ponse to biotic and abiotic stresses, and in developmental processes 
(summarized in Additional file 1 ). This enrichment corroborates that 
the pathological outcome of viral infection can be reproduced in 
silico in absence of the viral agent by altering the appropriated plant 
cellular programs. Hence, the plasticity of A. thaliana transcriptome 
to generate specific expression pattern as a response to multiple 
genetic perturbations suggests that non-infected cells could easily 
mimic transcriptomic responses to diverse viral infections. What 
kinds of perturbations contribute the most to this plasticity? 
Figures 5B and 5C show, respectively, the percentages of over- 
expressed and knocked out TFs for each redesigned TRNs (as before, 




Figure 5 | Statistics of the number of TFs perturbed and of the type of perturbation. (A) Number of TFs to perturb (knockout/over-expression) 
proposed by the optimization process in which only TFs or all genes are considered in the scoring function (blue and red bars, respectively, in the upper 
panel) . Grey bars show the number of TFs differentially expressed that were identified by gene expression upon infection with eight different viruses. We 
also show the intersection between the perturbed TFs proposed in the design and those with differentially altered gene expression (bottom panel). 
Random selections of TFs for designing simulating optimization processes without selective pressure were computed to test statistical significance (*** P 
< 0.001 ). (B) Percentages of over-expressed and (C) knockout TFs proposed to be perturbed in each design for the eight viruses. Error bars show SD from 
the 50 simulations. 
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blue and red bars correspond to scoring functions that use only TFs 
or all genes). Overall, re-engineered TRNs included more gene 
knockouts than over-expressions for all viruses except for the 
re-designs mimicking PPV infection, which are more balanced, indi- 
cating that gene knockouts generate more plasticity in gene express- 
ion than overexpressing genes. 

We have just shown that the list of TFs altered in the re-engi- 
neered TRNs were not necessarily the same set that have been 
shown to have altered expression in the real infections. Now, we 
want to explore whether this proposed set of TFs are located in the 
wild-type TRN close to TFs that are altered during real infections, 
as described in Figure 6A. If this is the case, then it can be argued 
that the proposed set of TFs does affect exactly the same target 
genes that may be affected by the TFs altered during viral infec- 
tion. Otherwise, it can be argued that the proposed set of TFs 
mimics infection by completely different mechanisms. To evaluate 
this question we proceeded as follows. For each proposed TF we 
evaluated a topological parameter, <Ptf> which takes into account 
the minimum undirected shortest path between this TF and all the 
TFs significantly altered by viral infection (see Methods). The 
parameter <P TP takes the value one if the proposed TF is included 
in the list of TFs altered in real infections and tends to zero as the 
distance to the closest altered TF increases in the network. 
Figures 6B and 6C show <P TF for each virus (panel B corresponds 
to scoring functions using only TFs and panel C to all genes). The 
statistical significance of <P TF was evaluated by generating random 
lists of transcription factors. Only TFs proposed in all re-designs 
of TuMV and TRV were significantly closer in the wild-type TRN 
to TFs differentially altered by viral infection (Figures 6B and 6C), 
for both scoring functions, than expected by sheer chance. 

All together, these results confirm that the re-engineered TRNs 
proposed by our computational methodology mimic the transcrip- 
tomic response observed under real viral infections by altering a 
smaller and different set of targets than those observed during the 
transcriptomic characterization of real infections. Indeed, in general, 
they are not even neighbors in the wild-type TRN. 



The number of proposed TFs to be perturbed and their connec- 
tivity degree correlates with the magnitude of the alterations in 
gene expression observed upon viral infection. The microarray 
data characterizing the infection of A. thaliana with each of the 
eight viruses shows significant variation in the number of genes 
whose expression was altered. Therefore, we hypothesized that in 
order to match the transcriptomic consequences of infection, our 
optimization algorithm shall propose more TF alterations for 
viruses that exert a large impact on the host transcriptome than for 
viruses that have a mild impact. To test this hypothesis we proceeded 
as follows. First, to collapse into a single quantity the impact of viral 
infection in the host transcriptome, we computed two different 
statistical measures, Pearson's correlation coefficient and the 
mutual information (MI) between the transcriptomic profiles of 
non-infected and infected plants. For viruses having minor impact 
in the plant transcriptome, the Pearson's coefficient will be close to 
one and the MI large. By contrast, a small correlation coefficient and 
low MI will reflect a strong perturbation in the plant's transcriptome. 
Second, we sought for associations between these indexes and the 
number of predicted altered TFs for each virus or their outgoing 
connectivity in the TRN. If our hypothesis is correct, we must 
observe significant negative correlations in these relationships. The 
largest changes in the transcriptome were found in cells infected by 
PPV, whereas the action of the CaLCuV showed the smallest 
variation in the host expression profile. Figures 7 A and 7B show 
the expected negative and significant correlations between the 
number of proposed alterations, using the scoring function based 
only in TFs, and the overall impact of the viral infection measured 
as the Pearson's correlation (Figure 7 A: r = -0.890, 6 d.f, P = 0.003) 
and as the MI (Figure 7B: r = -0.800, 6 d.f, P = 0.017). Analogously, 
Figures 7C and 7D also illustrate a significant negative correlation 
between the average outgoing connectivity of the proposed TFs and 
the overall impact of the viral infection measured as Pearson's 
correlation (Figure 7C: r = -0.789, 6 d.f, P = 0.019) and as MI 
(Figure 7D: r = -0.927, 6 d.f, P = 0.001). Additionally, we 
compared the average outgoing connectivity for the proposed TFs 
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Figure 6 | Measures of transcriptional information. (A) Scheme to illustrate the topological properties in a context of the A. thaliana TRN between the 
perturbed TFs proposed by our methodology and those identified differentially altered under viral conditions. (B, C) Transcriptional information of all 
TFs proposed to be perturbed in the designs. Note that blue and red bars show TFs proposed by using our design methodology evaluating only TFs or all 
genes in the scoring function, respectively. Random undirected shortest paths were computed to evaluate the statistical significance of the topological 
distance between the TFs for designing with respect TFs selected randomly in the A. thaliana interactome (*** P< 0.001). Error bars show SD from the 50 
simulations. 



SCIENTIFIC REPORTS | 2 : 1006 | DOI: 1 0. 1 038/srepOl 006 



5 



(a)£ 50 

H 




(e) 


1.0 


)gical 


0.8 


p, o 
3 a 


0.6 


0) o 
> P J3 


0.4 


o 


0.2 


i 

IB 


0.0 



(g) 



0.8 0.85 0.9 0.95 1 
transcriptomic Pearson correlation 
140 1 



0.5 1 1.5 2 

transcriptomic mutual information 



120 
100 

80 

60 - 

40 

20 
0 



rh 



+ 



rh 



rh 



f+1 rih 



[+1 rh 



PPV 



TMV 



TEV TuMV TEV-AU7 TCV 



TRV CaLCuV TRN 



Figure 7 | Complexity of the proposed solutions. (A, B) Relationship between the number of TFs proposed by the design algorithm to be perturbed, (C, 
D) their connectivity degree and (E, F) a combination of both variables, and the correlation (evaluated as Pearson correlation (A, C and E) or mutual 
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for each virus with respect to the average of all TFs of A. thaliana 
TRN (Figure 7G). Only in the case of PPV, the proposed TFs regulate 
significantly more genes than expected from random samples in the 
TRN. We could hypothesize that an optimal way to perturb the 
global transcriptome, thus mimicking the virus effect would be to 
alter transcriptional hubs. 

Next, we hypothesized a relationship between a new parameter, 
which collapses the two previous characteristics of the proposed TFs 
into a single figure, *F V , and the overall impact of the viral infection. 
Interestingly, we computed high and significant correlations by using 
the Pearson's correlation (Figure 7E: r = —0.936, 6 d.f., P = 0.001) or 
the MI (Figure 7F: r = -0.900, 6 d.f., P = 0.002) as measurements of 
impact of the viral infection in transcriptome. 

Hence, we could conclude that the amount and type (connectivity 
degree) of proposed perturbations needed to mimic a viral infection 
is directly dependent on the overall effect that the virus exerts on the 



host transcriptome: the larger the effect, the more perturbations are 
needed and vice versa. 

A crucial set of TFs is pervasively proposed in the redesigned 
TRNs. For each virus, we have run 50 independent optimization 
processes. Do all of them represent completely different solutions? 
Or by contrast, do all of them contain a preferred set of TFs? If the 
second situation is true, then we can suggest that the over- 
represented TFs represent a critical set of highly relevant TFs that, 
eventually, may be the focus of future experimental validation. 

For each virus, we tabulated the 50 lists of proposed TFs and 
calculated the probability of finding each TF in these lists. This 
probability can be taken as a measure of the design robustness of 
the prediction for each TF. A TF with a very low design robust- 
ness (e.g., < 25/50) means that it may have very low relevance in 
mimicking the transcriptomic profile induced by the virus. By 
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contrast, a TF with high design robustness (e.g., > 25/50) will be 
indicative that such TF plays a central role in mimicking the 
transcriptomic symptoms. Figure 8 (blue bars) shows such degree 
of design robustness for perturbations of all TFs for the eight 
viruses. In this case the scoring function used in the optimization 
process only accounted for TFs. The different simulations did not 
share most of the TFs proposed. However, a certain number of 
TFs were common for more than half of the simulation. This 
number of crucial TFs varies among viruses, ranging from zero 
(TRV) to 19 (PPV). Analogously, Figure S3 also shows the equi- 
valent results but weighting all genes in the scoring function. In 
this case the number of TFs that we can identify as crucial is much 
lower: Atlg43700 and Atlg24625 (TMV), Atlg24625 (TRV) and 
Atlgl7880 (TuMV) related to DNA mediated transformation, cel- 
lular response to sulfate starvation, negative regulation of cell 
differentiation, nuclear import, regulation of response to stress, 
sulfate transport and response to salt stress. As a way to evaluate 
the statistical significance of these results, we generated, for each 
virus, a new set of 50 simulations but without using the match in 
expression profile (S exp ) in the scoring function. The distribution 
of design robustness for these TRNs is shown on Figure 8 (red 
bars). In all cases, the distributions of design robustness did not 
reach values larger than 0.2, thus confirming that the existence of 
critical TFs could not be explained by chance. For each virus, the 
centrality and shape of both distributions were compared, and 
found significantly different (Mann-Whitney test, P < 0.001; 
Kolgomorov-Smirnov test, P < 0.001). Table 1 shows the lists 
of all TFs defined as critical for each virus. Interestingly, among 
the 47 TFs proposed, 22 are involved in several developmental 
processes; nine in responses to biotic and abiotic stresses and the 
rest do not have been assigned to any specific function. This 
enrichment in TFs involved in development may be consistent 
with the symptoms induced upon viral infection, which in most 
cases involve dwarfism, leaf malformations and curling and delays 
in the emergence and development of inflorescences. 

Proposed TFs common for different viruses. Finally, we explored 
the overlap between the lists of TFs whose expression was altered in 
the TRNs designed for the different viruses. For each virus, this list 
includes all the TFs proposed at least in one of the 50 simulations (i.e., 



not restricting the analyses to the critical set described in the above 
paragraph). As in previous sections, the lists of TFs were build both 
using the scoring functions based on TFs and in all genes. Figure 9A 
shows the number of TFs that are found in the lists of one or more 
viruses (blue and red bars corresponding to whether only TFs or all 
genes were accounted for in the scoring function). Specifically, 
around 700 TFs were virus-specific, but a large fraction (over 400) 
was shared by at least two viruses. Among these, ca. 200 TFs were 
shared by at least three viruses. In the right side of the distribution, we 
found 16 TFs shared by seven viruses. Finally, all viruses only share 
three TFs: Atlg50640, At2g35940 and At2g37650. Atlg50640 
corresponds to the ethylene-responsive transcription factor 3 
(ERF3) that negatively regulates the ethylene-mediated signaling 
pathway and gene transcription. At2g35940 corresponds to the 
BEL 1 -like homeodomain 1 protein (BLH1) that regulates trans- 
cription in response to abscisic acid stimulus. At2g37650 encodes 
for a TF of the GRAS family involved in root and leaf development 
and in the negative regulation of flower development. Hence, the 
number of TFs identified for an increasing number of viruses 
decreases exponentially (r = 0.969, 7 d.f, P < 0.0001), showing 
that there is a significant relationship between the number of TFs 
identified for one virus compared to another. In addition, since 
transcriptomics shows that the hosts tend to respond in a similar 
way to different pathogens, we identified a high number of TFs in 
common between multiple viruses with respect to list of TFs 
generated randomly (grey bars in Figure 9A). This would further 
support the proposal that TFs are not selected randomly and 
consequently, our methodology identified a specific set of TFs that 
mimic the common effects of virus infection. 

The overlap between pairs of lists was further quantified using the 
similarity index 2n xy /(n x + n y ), where n xy is the number of common 
entries in the two lists and n x and n y the length of each list. A 
similarity matrix containing all pairwise comparisons was con- 
structed and used to build neighbor-joining dendograms that cluster 
together viruses according to the similarity of their proposed lists of 
TFs (Figure 9B and figure 9C, using the scoring function based only 
on TFs or considering all genes, respectively). Three groups result 
when the first scoring function was used (Figure 9B). The first group 
is formed by PPV, TEV and TCV, the second group by TRV, TuMV, 
CaLCuV, and TEV-AU7, while TMV appears as the most dissimilar 
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Figure 8 | Histogram of the robustness of the TFs proposed by the design algorithm for the eight viruses (blue bars). Only TFs were considered in the 
scoring function for designing. TFs with design robustness > 0.5 are described in Table 1. The red bar histograms correspond to the robustness of TFs 
selected from randomly re-engineered TRNs. 
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Table 1 TFs proposed to be perturbed by the design methodology with a high robustness degree* 


Gene 


Gene Annotation 


Biological functions* 


CaLCuV 


At5g59820 


Zinc finger protein 


Cold acclimation, hyperosmotic salinity response, photosynthetic acclimation, response to 
UV-B, response to chitin, response to cold, response to heat, response to light stimulus, 
response to oxidative stress and response to wounding 


TEV-Afl 7 


At5g53200 
At5gll590 
Atlg66380 

Atlg01250 


Member of the DREB subfamily 
Involved in regulation of anthocyanin 

biosynthesis 
Member of the DREB subfamily 


Trichome branching 

+ 

Regulation of anthocyanin biosynthetic process and response to jasmonic acid stimulus 

+ 


PPV 


At5g20730 

At5g27610 
At5g64060 
At5g62165 
At5g58890 
At5g57420 
At5g53950 

At5g52660 

At5g50820 
At5gW140 

At4g31805 
At4g02670 
At3g25940 
At3g07610 

At3g01080 
Atlg66230 
Atlg62310 
At5g20730 


Encodes an auxin regulated activator 

Encodes a MADS box TF 

Belongs to auxin inducible gene family 
Transcriptional activator of the NAC 
gene family 

Homeodoma in-like superfamily protein 

NAC domain containing protein 97 
MADS-box protein 

WRKY family 

TFIIB zinc-binding protein 
IBM1 

Member of WRKY TF 
TF jumonji 

Encodes an auxin-regulated activator 


Blue light signaling pathway, gravitropism, lateral root development, lateral root 

primordium development, leaf development, phototropism, response to auxin stimulus 
and response to ethylene stimulus 

+ 

Multicellular organismal development 

Abscission, leaf senescence and regulation of flower development 

+ 

Response to auxin stimulus 

Formation of organ boundary, leaf development, multicellular organismal development, 

primary shoot apical meristem specification, regulation of timing of organ formation and 

secondary shoot formation 
Response to abscisic acid stimulus, response to cadmium ion, response to ethylene stimulus, 

response to gibberellin stimulus, response to jasmonic acid stimulus, response to salicylic 

acid stimulus and response to salt stress 
Multicellular organismal development 

Negative regulation of flower development, regulation of circadian rhythm, response to 
temperature stimulus and vernalization response 

+ 
+ 
+ 

DNA methylation on cytosine, flower development, histone H3-K9 demethylation, leaf 

development and pollen development 
Regulation of defense response 
Regulation of secondary cell wall biogenesis 

+ 

Blue light signaling pathway, gravitropism, lateral root development, lateral root 
primordium development, leaf development, phototropism, regulation of growth, 
response to auxin stimulus and response to ethylene stimulus 


TCV 


Atlg59930 


Encodes a maternally expressed 
imprinted gene 


+ 


TEV 


At2g21650 

At2g25230 
Atlg31050 


RSM1 is a member of 
a family ofMYB TFs 

Basic helix-loop-helix DNA-binding 
family protein 


Embryo development ending in seed dormancy, gravitropism and response to red light 

+ 
+ 


TMV 


At2g31180 

At5g49330 
At3g57390 

Atlgl8710 
Atlg25250 
At5g55390 


Member of the R2R3 factor gene family 

Member of the R2R3 factor gene family 
Encodes a MADS-box 

Member of the R2R3 factor gene family 

Encodes enhanced downy mildew 2 


Response to auxin stimulus, response to ethylene stimulus, response to jasmonic acid 

stimulus, response to salicylic acid stimulus and response to salt stress 
Flavonol biosynthetic process 

Negative regulation of flower development, negative regulation of short-day 

photoperiodism and flowering 
Response to jasmonic acid stimulus and response to salt stress 

+ 

Defense response to fungus, positive regulation of flower development, signal transduction 
and vegetative to reproductive phase transition of meristem 
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Table 1 | continued 


Gene 


(j7ene Annotation 


Biological functions* 


At5g 13790 


Member of the MADS 


Cellular response to auxin stimulus, embryo development ending in seed dormancy, fruit 






abscission, fruit dehiscence, gibberellin catabolic process, negative regulation of floral 






organ abscission, negative regulation of flower development, negative regulation of seed 






maturation, negative regulation of short-day photoperiodism, flowering and somatic 






embryogenesis 


At3g60390 


Encodes homeobox protein HAT3 


+ 


At3gl6500 


Phytochrome-associated protein 1 


Response to auxin stimulus 


At3g 16857 


Encodes an Arabidops'ts response 


Cvtokinin mediated sianalina pathway, primary root development, reaulation of 




regulator protein 


anthocyanin metabolic process, reaulation of chlorophyll biosynthetic process, response 






to cytokinin stimulus and shoot development 


At3g03660 


Encodes a WUSCHEL-related 


+ 




homeobox gene 




Atlg68120 


Basic pentacysteine 3 


+ 


Atlg70510 


A member of class 1 knotted 1 -like 


Cvtokinin mediated sianalina pathway response to ethylene stimulus and specification of 




homeobox 


carpel identity 


Atlg28420 


Homeobox- 1 


+ 


Atlg56170 




+ 


Atlgl0200 


Encodes a member of the Arabidops'ts 


Actin filament bundle assembly 




LIM proteins 




Atlg27050 


Encodes a protein with a RNA 


+ 




recognition motif 




A\2g 18350 


Homeobox protein 24 


+ 


At2g45160 


Belongs to one of the LOM genes 


Cell differentiation, cell division, maintenance of shoot apical meristem identity and root hair 






cell tip growth 


TuMV 


At5g07690 


Encodes a putative TF MYB29 


Defense response to fungus 


f Degree of desic 


n robustness > 0.5. 




+ All genes showed biological functions related to regulation of transcripti 


on and DNA-dependent. 


Notice that all ge 


nes showed were proposed in designs in which the scor 


ng function evaluated only TFs in the optimization process. 



virus. Given the nature of the data used to build up this dendogram 
(i.e., similarities among lists of TFs whose alteration mimic the symp- 
toms of infection), not surprisingly, the clustering does not reflect 
phylogenetic relationships between viruses (PPV, TEV, TEV-Afl7, 
and TuMV are all classified within the same taxonomic genus, the 
Potyvirus) nor whether they share common hosts in nature (TCV, 
TuMV, CaLCuV, and TEV-Arl7 infect either A. thaliana or related 
hosts in the Brassicaceae). By contrast, when the scoring function 
used in the optimization process takes into account all genes, the 
dendogram obtained clusters together all the A. thaliana-iniecting 
viruses (Figure 9C). Thus suggesting that host-driven selection may 
have determined the set of genes that viruses infecting the same host 
manipulate to optimize their replication. 

Discussion 

The ultimate aim of synthetic biology is de novo construction of an 
organism by combining standardized parts 33 . More modest aims are 
to re-design already existing biological systems, by adding new ele- 
ments or rewiring existing ones, to produce certain new metabolites 
or to improve certain physiological processes 34,35 . Despite being 
modest, these are very challenging goals that will clearly gain from 
the development of computational tools that will help to a rational in 
silico design of new circuits, allowing comparing them, analyzing 
their dynamical properties, and selecting the optimal one to be finally 
implemented in vivo 29i6 . Here, we take a computational synthetic 
biology approach to tackle a somehow different and new problem: 
identifying sets of cellular factors whose perturbation results in the 
same transcriptomic pattern that is observed in plant cells infected by 
viruses. This approach is opposite to the usual research pathway 
taken by molecular virologists that consists in first identifying cell 
proteins that interact with viral proteins and that result in the typical 
symptoms of an infection 81718 . Our approach compares the tran- 
scriptomic profiles of non-infected plants with those observed for 



virus-infected plants and searches in silico for those minimal sets of 
TFs or genes whose alteration (knockout or overexpression) results 
in a predicted transcriptomic pattern that minimally deviates from 
the observed in infected plants. We have performed this computa- 
tional search for a set of eight plant viruses all infecting the same host, 
A. thaliana. 

Plants have evolved defense mechanisms to recognize pathogens 
and defeat viruses, but viruses have developed elements that interfere 
and suppress these mechanisms. The computational methodology 
proposed in this article allows exploring the plasticity of A. thaliana 
transcriptome in response to the alteration of certain key TFs. 
Although our approach only proposed synthetic perturbations in 
TFs, it has been established that some viruses affect the expression 
and activity of enzymes. A good example of such alteration in non- 
TFs is the effect of geminivirus infection on the activity of kinases 
such as adenosine kinase 37 39 , SNFl-related kinases 40 and shaggy- 
related kinase AtSKeta 41 . We have addressed the problem of re- 
design a plant TRN to mimic the transcriptomic response observed 
upon viral infection but in absence of any intracellular pathogen. In 
the case of eight different viruses, our methodology rendered re- 
engineered TRNs that, in silico, captured the transcriptional res- 
ponses of the infected host A. thaliana. Interestingly, we found that 
for all eight viruses, the optimization process resulted in a significant 
improvement in the similarity between observed and simulated data. 
Surprisingly, this high degree of mimicking was reached by manip- 
ulating a reduced number of TFs associated to developmental pro- 
cesses and to responses to biotic and abiotic stresses. In other words, 
our method is not trivially identifying TFs that were already affected 
by the viruses but different sets of TFs whose coordinated alteration 
result in a similar transcriptomic pattern. Note that in the collected 
microarray experiments, many TFs are expressed at levels undistin- 
guishable from the background. Thus, the minimal set of TFs pro- 
posed could be biased towards those expressed at higher levels. 
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Figure 9 | Commonalities among viruses in the proposed TFs. (A) Number of common TFs proposed to be perturbed by the model (blue and red bars) 
and by 1000 random simulations (grey bars) for several viruses. (B, C) Neighbor-joining dendograms obtained from the similarity matrix computed from 
overlapping lists of TFs proposed to be perturbed in the different designs for the eight viruses. Note that only TFs ((B) and blue bars in (A)) or all genes 
( (C) and red bars in (A) ) were considered in the scoring function for designing. 



However, although this may certainly be a potential source of bias, we 
believe it is not of importance because (;') our lists would still repres- 
ent a conservative set of TFs and (if) TFs with low changes in express- 
ion across the microarray compendium may still be associated to 
high values of its corresponding regulatory effect parameters 28 (see 
Methods section). Other potential source of bias comes from the fact 
that only approximately 75% of all A. thaliana TFs have been used in 
our study, those that appear annotated as such in TAIR, thus we may 
be missing some relevant ones. This deficiency may be solved in the 
future as annotations in databases are curated and improved. 

Virus infections are dynamic processes, and the reprogramming of 
the transcriptomic profile may undergo important changes along the 
curse of infection. Since the datasets used in this study correspond to 
samples taken at a single time point, the results may be somehow 
biased towards TFs affecting late-response genes rather than to those 
being altered at early stages of the infection and that, not unlikely, 
may correspond to defense-related genes. However, it is worth not- 
ing that in a previous meta-analysis of the same experimental 
data used here 19 , we failed to detect significant differences in gene 
expression when experiments were arbitrarily ranked according to 
the day in which samples were taken (i.e., early, intermediate and 
late). At face value, this result suggested that observed differences in 
gene expression among the eight viruses used in this study are not 
explicable by the time at which samples were taken. Nonetheless, as 
more transcriptomic data become available in public databases, espe- 
cially data from temporal samples, we will address dynamic model- 
ing of TRN rewiring. 

As we expected, the complexity of the redesigned TRNs, in terms 
of number of necessary TF to be perturbed, correlated with the 
amount of change induced by each viral infection to the transcrip- 
tome of infected plants. In other words, the more differences exist 
between non-infected and infected plants, the more TFs need to be 
altered to minimize differences between both transcriptomic profiles. 
In addition, we found certain degree of overlap between the TFs 



selected in each run of the optimization algorithm, providing evid- 
ence that a set of essential TFs is able to generate high plasticity in 
gene expression of the plant. This is to say, these TFs are not trivially 
selected but instead are keystone in maximizing the similarity 
between simulated and observed transcriptomic profiles. Finally, 
we found that some TFs were shared among different viruses in a 
phylogenetic-independent manner. That is, viruses do not group 
according to their phylogenetic relatedness but according to the list 
of altered TFs in common. We interpret this observation as a con- 
sequence of the optimization process that identifies core TFs 
responsible for the big alterations in transcriptomic profiles. 

Experimental validation of our computational predictions for each 
virus requires the generation of mutant A. thaliana lines with altered 
expression of the TFs listed in Table 1. This represents a considerable 
amount of work beyond the scope of this manuscript. However, for 
CaLCuV, TCV and TuMV our method has robustly suggested that 
the alteration of a single TF may be enough to mimic viral infection; 
giving some hope validating our predictions by exploring available 
information for these TFs. If alteration of a single TF translates into 
large transcriptomic and/or phenotypic effects, then that would 
at least confirm that our method correctly selects for changes 
potentially mimicking the consequences of viral infection. By con- 
trast, if the effect of these proposed alterations is limited, then some 
warning flags should be raised against the value of our predictions. In 
the case of CaLCuV, At5g59820 encodes for a zinc finger protein 
involved in the response to several stress responses, notably to high 
light and cold acclimation. It is expressed in multiple tissues and at 
different stages of development. Overexpression of this TF has a 
cascade effect and alters the expression levels of other 24 cold- 
responsive genes, including C-repeat binding factor (CBF) genes 42,43 . 
In the case of TCV, the Atlg59930 TF is involved in the regulation of 
expression of maternally imprinted genes 44 . Currently, ca. 50 
imprinted A. thaliana genes are known. These genes encode for 
proteins that are implicated in a wide range of functions and 
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processes, including other TFs, proteins involved in hormone signal- 
ing, components of the ubiquitin protein degradation pathway, reg- 
ulators of histone and DNA methylation, and small RNA pathway 
proteins 44 . Finally, At5g07690 predicted for TuMV encodes for an 
R2R3 MYB29 TF expressed at many developmental stages (including 
leaf senescence, and petal differentiation and expansion) and in 
many different tissues. MYB29, together with MYB28, is involved 
in the methyl jasmonate-mediated induction of a set of aliphatic 
glucosinolates (GSLs) biosynthetic genes 45-46 . GSLs are well known 
for their role as signaling molecules in plant defense against herbiv- 
ory, wounding, induction of systemic resistance, responses to other 
hormones and, overall, affect the adaptation of plants to variable 
environments 47,48 . In conclusion, even for the most simple situations 
described in Table 1, alteration of the predicted TFs results in a 
cascade of effects, thus confirming that our method is not predicting 
TFs at random but precisely those which are central for multiple 
processes. 

This computational work, identifying reduced sets of TFs that 
result in mimicking the symptoms of infection, opens the doors to 
future experiments that may use the A. thaliana gene knock out 
collections not only to validate our prediction but also as a way of 
reach a better understanding of the molecular mechanisms of viral 
pathology. Even perhaps as potential targets for future therapeutic 
interventions. In this sense, the overexpression of one or several of 
the core TFs suggested by our study may result in tolerant plants, 
which will not develop symptoms despite being infected. Transgenic 
overexpression of these key TFs may compensate for the reduction in 
concentration forced by viral infections thus significantly reducing 
symptoms. 

Methods 

Transcriptional regulatory network of Arabidopsis thaliana. A linear model 
based on ODEs providing the steady states of each mRNA was used to describe 
transcriptional gene regulations 28 . Transcriptomic data were normalized and 
represented in log-scale. Thus, the mRNA steady state from the i th gene, g h is given 
by 



dy,/dt=a i + ^ft^-^, 



(1) 



where a { is its constitutive transcription rate, is the regulatory effect that gene j has 
on gene i and <5, is the degradation coefficient. Time was conveniently scaled such that 
<5, — 1 and the model is assumed in steady state (y,- — a, + jE?^). Notice that each 
gene expression is bounded in a range interval defined by the minimum iy™* 1 ) and 
maximum (y| mx ) value of all its conditions obtained from the 1436 hybridization 
experiments collected in the TAIR website (www.arabidopsis.org). To calibrate TF 
expression, the newly redefined constitutive transcription rate a) — a, + s, included a 
perturbative term (e f ) that fit only the TF expression profile (y opt ) for a defined 
optimal condition in which cell growth is maximal. 

Plant viruses and transcriptomic data. Seven positive -sense single -stranded RNA 
viruses and one virus whose genome is composed by a single-stranded circular 
ambisense DNA molecule all infecting a common plant host, A. thaliana, were 
selected. The set of RNA viruses comprises three members of the Potyviridae family, 
Tobacco etch potyvirus (TEV), Turnip mosaic potyvirus (TuMV) and Plum pox 
potyvirus (PPV), two members of the family Virgaviridae, Tobacco mosaic 
tobamovirus (TMV) and Tobacco rattle tobravirus (TRV), and one member of the 
Tombusviridae family, Turnip crinkle carmovirus (TCV). We also considered a 
laboratory- strain of TEV (TEV-Afl7) evolved in and adapted to A. thaliana 10 . The 
ssDNA virus included in the study was the member of the Geminiviridae family 
Cabbage leaf curl begomovirus (CaLCuV). 

TEV and TEV-Afl7 expression data (two-color raw data, NCBI GEO accession 
GSE11088) were obtained from ecotype Ler-0 plants 14 days post -inoculation 
(dpi) 9,10 . TuMV data (Affymetrix raw data, ArrayExpress accession e-mexp-509) 
were obtained 5 dpi from ecotype Col-0 plants 16 . PPV data (Affymetrix prepro- 
cessed data, NCBI GEO accession GSE11217) were obtained 17 dpi from Col-0 
plants 12 . TMV data (two-color raw data, deposited in www.bio.puc.cl/labs/arce/ 
index.html) were obtained from ecotype Uk-4 plants 10 dpi 14 . TRV data (two- 
color raw data), NCBI GEO accession GSE15557, GSE155562 and GSE15558) 
were measured eight dpi from Col-0 leafs. TCV data (two-color raw data, NCBI 
GEO accession GSE29387) were quantified 10 dpi in Col-0 plants. Finally, 
CaLCuV data (Affymetrix raw data, ArrayExpress accession E-ATMX-34) were 
collected from Col-0 plants 12 dpi 11 . The list of differentially expressed genes was 
obtained by performing a fold-change criterion of z > 1.96 over all genes 



(averaging replicates). The effect of heterogeneity in the host ecotype used in 
different experiments (Ler-0 for TEV and TEV-Afl7, Uk-4 for TMV and Col-0 for 
the rest) and in the time at which samples were obtained (ranging from 5 to 17 
dpi) has been explored else where 19 . 

Genome-wide multiple-optimization. Our algorithm searches possible 
reconfigurations of the global transcription regulatory network such as that the 
expression profile of the re-engineered genome mimics the transcriptional response 
of the host infected by different viruses. We address this problem by using a high 
efficient heuristic optimization. We suggest genome reconfigurations that include 
multiple knockouts or over- expression among the 1 187 TFs identified in the A. 
thaliana genome, by imposing in the model yrF —yrp 1 or /tp*' respectively, or both 
types of perturbations in order to enlarge the combinatorial space of perturbed 
genomes targeting the transcriptional response given under viral infection. We 
started from the inferred model and at each step in the optimization process, we 
randomly selected a new TF to evaluate its three states (knockout, over- expression or 
wild-type) and update the model with the best-scored scenario until all TFs have been 
manipulated. 

We use a bi-objective scoring function, <j), to force a minimum average deviation to 
the viral infection expression profile in steady state {S exp ) with the minimum number 
of genetic perturbations {S pert ), 



where 



and 



<j> = 3S exp + (l-$)S pt 



N> 
Ntf 



(2) 



(3) 



(4) 



Q is the parameter that defines the weighting factor of each objective function (S = 0.5 
for all simulations),^ is the expression profile under viral infection of all genes (N g ) of 
A. thaliana and X g <= (0, 1) is a parameter defined for each gene that differences those 
genes differentially expressed in the microarray data measured under viral conditions. 
In fact, we can divide the objective function representing the expression score into two 
terms: a first that quantifies expression deviations from genes that have been iden- 
tified as altered in the transcription and a second to compute the rest of genes. We 
imposed /L = 1 for minimizing the fact that genes differentially expressed in the 
microarrays could have more importance in S exp . N TF is the total number of TFs 
annotated in A. thaliana and Njp is the number of TFs that the model suggests to be 
perturbed. 

Transcriptional information. First, we computed the shortest paths SrF-g> in the 
plant TRN, between a TF (among the set of proposed perturbations in a given design) 
and a set of TFs potentially affected by global changes in gene expression, g (genes 
differentially altered by viral infection in the microarray). Hence, to measure the 
degree of contribution that a TF has to mimic a target gene expression of the TRN of 
A. thaliana, we defined a parameter 



Or F = 1/(1+ min((5 TF -g) ) 



(5) 



that evaluates the minimum shortest path (i.e., the minimal number of links between 
a TF and the genes differentially expressed under viral conditions). 

Topological parameter of TRN designed. To characterize the impact of the set of 
TFs proposed in the TRN of A. thaliana, we defined a topological parameter for each 
viral infection, v, 



W V = A- 



jpert 
v Tf,V 

maxy [N^p V 



pet 

Vtf 



[Ttp.v I 



(6) 



that evaluates the number of TFs proposed, N^ F , and the average outgoing con- 
nectivity of such TFs, y v , relative to the virus with maximum N^T or y, respectively. 
Note that we fixed the weighting factor, X = 0.5, to provide the same importance to 
both topological features. 
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